STATS 32 Session 7: Making Maps in R

Kenneth Tay

Oct 23, 2018

Announcements

Recap of week 3

Some notes about projects

Agenda for today

Making maps in R: maps package

Example: Drawing CA and its counties (1)

library(ggplot2)
library(maps)
library(dplyr)
library(readr)
county_data <- map_data("county")
CA_data <- county_data %>% filter(region == "california")
head(CA_data)
##        long      lat group order     region subregion
## 1 -121.4785 37.48290   157  6965 california   alameda
## 2 -121.5129 37.48290   157  6966 california   alameda
## 3 -121.8853 37.48290   157  6967 california   alameda
## 4 -121.8968 37.46571   157  6968 california   alameda
## 5 -121.9254 37.45998   157  6969 california   alameda
## 6 -121.9483 37.47717   157  6970 california   alameda

Example: Drawing CA and its counties (2)

County outlines are drawn using geom_polygon.

ggplot(data = CA_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, group = group))

Example: Drawing CA and its counties (3)

coord_quickmap() preserves the aspect ratio of the map.

ggplot(data = CA_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, group = group)) + 
    coord_quickmap()

Example: Drought levels in CA (1)

drought_data <- read_csv("Drought for Session 7.csv")
head(drought_data, n = 3)
## # A tibble: 3 x 2
##   County  Drought_percent
##   <chr>             <dbl>
## 1 alameda             100
## 2 alpine              100
## 3 amador              100
head(CA_data, n = 3)
##        long     lat group order     region subregion
## 1 -121.4785 37.4829   157  6965 california   alameda
## 2 -121.5129 37.4829   157  6966 california   alameda
## 3 -121.8853 37.4829   157  6967 california   alameda

Our drought data and mapping information are in different datasets!

Example: Drought levels in CA (2)

Solution: Use joins (in dplyr package).

combined_data <- CA_data %>%
    left_join(drought_data, by = c("subregion" = "County"))

Example: Drought levels in CA (3)

head(combined_data)
##        long      lat group order     region subregion Drought_percent
## 1 -121.4785 37.48290   157  6965 california   alameda             100
## 2 -121.5129 37.48290   157  6966 california   alameda             100
## 3 -121.8853 37.48290   157  6967 california   alameda             100
## 4 -121.8968 37.46571   157  6968 california   alameda             100
## 5 -121.9254 37.45998   157  6969 california   alameda             100
## 6 -121.9483 37.47717   157  6970 california   alameda             100

Example: Drought levels in CA (4)

Map the fill attribute of geom_polygon to the Drought_percent column.

ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    coord_quickmap()

Example: Drought levels in CA (5)

Use scale_fill_distiller to define a more appropriate color scale.

ggplot(data = combined_data) + 
    geom_polygon(mapping = aes(x = long, y = lat, 
            group = group, fill = Drought_percent)) + 
    scale_fill_distiller(palette = "YlOrRd", direction = 1) +
    coord_quickmap()

Today’s dataset: 2016 US Presidential Elections









Optional material

Other types of joins

Inner join: Matches pairs of observations with equal keys, drops everything else. Hence, only keeps observations which appear in both datasets.

Other types of joins

After matching pairs of observations with equal keys…